CDS 6324 - Data Visualization

Lecture 7: Interactive Visualization

1. Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is the process of examining data before building visualizations or conducting detailed analysis.
Before designing a visualization, understand your data first.
EDA was introduced by John Tukey (1977).

2. Why EDA Matters

EDA helps reveal patterns hidden inside raw data.
Raw tables are difficult to interpret. Visual exploration makes patterns easier to discover.
🧠 EDA = Explore Before Explain

3. What EDA Helps You Discover

Goal Purpose
Outliers Detect unusual values
Distributions Check normality or skewness
Correlations Identify relationships
Data Quality Find errors or missing values
Transformations Suggest log or other transformations

4. Common EDA Visualizations

Humans are excellent pattern recognizers.

5. Important Statistical Measures

Measure Description
Mean Average value
Median Middle value
Mode Most frequent value
Range Max − Min
Variance Spread of values
Skewness Measure of asymmetry

6. Quartiles and IQR

Quartiles divide sorted data into four equal parts.
Interquartile Range (IQR) = Q3 − Q1
IQR is commonly used to detect outliers.

7. EDA in a Nutshell

Always inspect every variable in your dataset.
🧠 Bottom Line:

Always look at your data before modeling it.

8. What is Interaction?

Interaction between people and machines requires mutual intelligibility or shared understanding.
Interactive visualization allows users to actively explore and manipulate visual representations of data.

9. Taxonomy of Interaction

Category Techniques
Data & View Specification Visualize, Filter, Sort, Derive
View Manipulation Select, Navigate, Coordinate, Organize
Process & Provenance Record, Annotate, Share, Guide

10. Data & View Specification

Allows users to determine what data is displayed and how it is displayed.
🧠 Choose the Data + Choose the View

11. View Manipulation

Allows users to interact directly with visual representations.

12. Process & Provenance

Supports tracking and communicating analysis progress.

13. Reorderable Matrix

Rows and columns can be reordered to reveal patterns and relationships.
Rearranging data often reveals hidden structures.
Used to discover relationships through permutation.

14. Matrix Files & Image Files

Technique Purpose
Image File Represent ordered objects visually
Matrix File Handle very large dimensions
Sorting is used to discover correlations.

15. Selection Techniques

Type Example
Point Selection Mouse click, hover, tap
Region Selection Lasso, rubber-band selection
🧠 Point = One Item
Region = Many Items

16. Brushing and Linking

Selecting data in one view automatically highlights related data in another view.
Multiple visualizations become connected.
Select players with high salaries and see them highlighted in other baseball statistics charts.
One of the most important interaction techniques.

17. Linked Highlighting

Highlighting selected data across multiple visualizations simultaneously.
Helps compare patterns across different views.

18. Dynamic Queries

Interactive filtering where results update immediately as controls are adjusted.
Adjust a price slider and instantly see matching houses.
🧠 Dynamic Queries = Instant Feedback

19. Problems with Text-Based Queries

20. Direct Manipulation

Users interact directly with visual objects rather than typing commands.
Easier and more intuitive than textual queries.

21. Dynamic Query Advantages & Disadvantages

Pros Cons
Easy for beginners Limited query complexity
Fast exploration Many controls may clutter interface
Immediate feedback Screen space limitations

22. Trellis Display

A framework that divides data into multiple panels based on categories.
Allows easy comparison across groups.
Split a scatterplot by gender or political affiliation.

23. Big Data Visualization

Interactive visualization must remain responsive even with billions of records.
Two Major Challenges:
  • Effective visual encoding
  • Real-time interaction

24. Big Data Techniques

Technique Purpose
Sampling Use subset of data
Binning Group values together
Modeling Represent patterns efficiently
Aggregation Summarize large datasets
🧠 Don't visualize billions of records directly. Summarize first.

25. Final Exam Summary

Most Important Points

  • EDA: Explore data before visualization.
  • Tukey (1977): Introduced EDA.
  • EDA Goals: Find distributions, outliers, correlations and quality issues.
  • Taxonomy of Interaction: Data & View Specification, View Manipulation, Process & Provenance.
  • Selection: Point selection and region selection.
  • Brushing & Linking: Highlight related data across views.
  • Dynamic Queries: Interactive filtering with immediate feedback.
  • Direct Manipulation: Pointing instead of typing.
  • Trellis Displays: Compare categories using multiple panels.
  • Big Data: Use sampling, binning, modeling and aggregation.